Mining Semantic Loop Idioms from Big Code

نویسندگان

  • Miltiadis Allamanis
  • Earl T. Barr
  • Christian Bird
  • Premkumar Devanbu
  • Mark Marron
  • Charles Sutton
چکیده

During maintenance, developers spend a lot of time transforming existing code: refactoring, optimizing, and adding checks to make it more robust. Much of this work is the drudgery of identifying and replacing specific patterns, yet it resists automation, because of meaningful patterns are hard to automatically find. We present a technique for mining loop idioms, surprisingly probable semantic patterns that occur in loops, from big code to find meaningful patterns. First, we show that automatically identifiable patterns exist, in great numbers, with a large scale empirical study of loop over 25 MLOC. We find that loops in this corpus are simple and predictable: 90% of them have fewer than 15LOC and 90% have no nesting and very simple control structure. Encouraged by this result, we coil loops to abstract away syntactic diversity to define information rich loop idioms. We show that only 50 loop idioms cover 50% of the concrete loops. We show how loop idioms can help a tool developers identify and prioritize refactorings. We also show how our framework opens the door to data-driven tool and language design discovering opportunities to introduce new API calls and language constructs: loop idioms show that LINQ would benefit from an Enumerate operator, a result confirmed by the fact that precisely this feature is one of the most requested features on StackOverflow with 197 votes and 95k views.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining of Source Code Concepts and Idioms An Approach based on Clone Detection Techniques

This paper introduces a new view on program source code with a focus on code clone information. An algorithm is presented that transforms source code into an equivalent representation which expresses code redundancies as hierarchical clone classes explicitly. This representation supports program comprehension by pointing out arbitrary programming idioms and the frequencies of their occurrences ...

متن کامل

Towards Maintenance Support for Idiom-based Code Using Sequential Pattern Mining

Developers often use an idiom to implement a concern. When a fault is found in an idiom, developers have to find all source code fragments derived from the original. While code-clone detection tools can detect copy-andpasted code, such tools cannot detect code fragments modified after pasted. We are investigating a sequential pattern mining approach to capture idiom-based code that spread acros...

متن کامل

Computing Idioms Frequency in Text Corpora

The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and wh...

متن کامل

Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions

We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that words in a given text segment, such as a paragraph, that are highranking representatives of a common topic of discussion are less likely to be a part of an idiomatic expression. Our additional hypothesis is that contexts in which idioms occur, typically, are more affective and ...

متن کامل

Big Data Mining and Semantic Technologies: Challenges and Opportunities

Big data a term coined due to the explosion in the quantity and diversity of high frequency digital data which is having a potential for valuable insights has drawn the most attention in the area of research and development. Converting big data to actionable insights requires depth understanding of big data, its characteristics, challenges and current technological trends. A rise of big data is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016